STAT331_Final_Project

Author

Justus Nunez, Daniel Alvarez, Matt Babb, Maya Doitch

I. Introduction

This report investigated the relationship between child mortality and carbon dioxide emissions per capita over the years of 1800 – 2022. Two data sets were acquired from Gapminder, which is an organization that seeks to educate the world’s transmitters of information with up-to-date facts about many metrics of health, infrastructure, sustainability, and many other crucial topics.

The first data set, called child_mortality_data, lists the rates of child death under the age of five per 1,000 births from 197 countries (data begins at 1950 for 12 countries). Each observation corresponds to a single country, and each variable is one year.

The second data set, called co2_pcap_cons, tracks tonnes (metric) of carbon dioxide emissions per capita for 194 countries. There are no NA values in this data set. Each observation corresponds to a single country, and each variable is one year.

The hypothesized relationship between the two variables is that as the amount of CO2 emissions per capita increases for a country, its child mortality rate will drop. While we are not investigating causality, we anticipate that increased CO2 emissions indicate increasing industrialization and modernization within a country, which tends to predict dropping levels of child mortality (Ranganathan et al., 2015).

Code
# Load in packages and .csv files

library(tidyverse)
library(broom)

child_mortality_data <- read.csv(here::here("child_mortality_0_5_year_olds_dying_per_1000_born.csv"))

co2_pcap_cons <- read.csv(here::here("co2_pcap_cons.csv"))

II. Methods

a. Data cleaning

After loading the necessary packages and reading in the two Gapminder files, both data sets were cleaned for ease of analysis. Each data set had its year variables re-designated as characters to standardize the variable type. The X’s in front of each year (carried over from Gapminder) were removed, and each year was then re-designated as a numeric type. Each data set was then pivoted into a long format so that each observation corresponded to a single country for a single year, along with their respective child_mortality and emissions values. These two data sets were named child_mort_long and co2_long.

Finally, these two cleaned data sets were joined into one data set called final_data, which includes both child_mortality and emissions values for 194 countries over the years of 1800 – 2022.

Code
# Convert co2_pcap_cons into long format

co2_long <- co2_pcap_cons %>%
  mutate(across(matches("^X(18|19|20)\\d{2}$"), as.character)) %>% 
  pivot_longer(
    cols = matches("^X(18|19|20)\\d{2}$"), 
    names_to = "year",
    values_to = "emissions"
  ) %>%
  mutate(
    year = str_remove(year, "^X"),
    year = as.numeric(year)
  )


# Convert child_mortality_data into long format

child_mort_long <- child_mortality_data %>%
  mutate(across(matches("^X(18|19|20|21)\\d{2}$"), as.character)) %>% 
  pivot_longer(
    cols = matches("^X(18|19|20|21)\\d{2}$"), 
    names_to = "year",
    values_to = "child_mortality"
  ) %>%
  mutate(
    year = str_remove(year, "^X"),
    year = as.numeric(year)
  )


# Join the data sets

final_data <- child_mort_long %>%
  inner_join(co2_long, by = c("year", "country")) %>%
  mutate(
    child_mortality = as.numeric(as.character(child_mortality)),
    emissions = as.numeric(as.character(emissions))
  )

b. Exploratory analysis

In order to get a sense of how these two variables relate over time, child_mortality and emissions (log transformed) for each country were averaged for each decade starting from the year 1953 through the year 2022. A new variable called Decade was created for this purpose. The resulting mean values are represented in the faceted plot below. Each point represents the mean values over a decade of child_mortality and emissions per capita for a single country.

Code
final_data <- final_data %>%
  mutate(Decade = cut(year, breaks = seq(1953, 2023, 10), 
                      labels = c("1953-1962", "1963-1972",
                                 "1973-1982", "1983-1992",
                                 "1993-2002", "2003-2012",
                                 "2013-2022"),
                      include.lowest = TRUE))


final_data_means <- final_data %>%
  filter(!is.na(Decade)) %>%
  group_by(country, Decade) %>%
  summarize(
    mean_emissions = mean(emissions, na.rm = TRUE),
    mean_child_mortality = mean(child_mortality, na.rm = TRUE),
    .groups = 'drop'
  )


ggplot(final_data_means, aes(x = mean_emissions,
                             y = mean_child_mortality)) +
  geom_point() +
  facet_wrap(Decade ~ ., scales = "free") + 
  labs(title = "Mean Child Mortality vs. Mean Emissions Over Decades (1953 - 2022)",
       x = "Mean Emissions Per Capita (Log Transformed)",
       y = "") +
  scale_x_log10() +
  theme_bw()

Code
# Un-logged
ggplot(final_data_means, aes(x = mean_emissions,
                             y = mean_child_mortality)) +
  geom_point() +
  facet_wrap(Decade ~ ., scales = "free") + 
  labs(title = "Mean Child Mortality vs. Mean Emissions Over Decades (1953 - 2022)",
       x = "Mean Emissions Per Capita",
       y = "") +
  theme_bw()

Each facet in the above plot clearly shows that the global correlation between child_mortality and emissions is negative in every decade since 1953. This consistent pattern supports the hypothesis of an inverse relationship between these variables and provides a solid foundation for further analysis.

III. Analysis

a. Linear regression

To evaluate the relationship between mean_child_mortality and mean_emissions, the final_data set was processed to compute yearly global averages for these variables, resulting in a dataset named yearly_means. Then, a scatterplot was generated, where each point represents the global averages for a given year. An overlaid line of best fit reveals a pronounced negative correlation, indicating that years with higher emissions are generally associated with lower child mortality rates, and vice versa.

Code
# NEED TO DISCUSS - THIS PLOT USES THE WRONG VARIABLES

final_data %>%
  group_by(country) %>%
  summarise(
    mean_child_mortality = mean(child_mortality, na.rm = TRUE),
    mean_emissions = mean(emissions, na.rm = TRUE)
  ) %>%
  ggplot(aes(x = mean_emissions, y = mean_child_mortality)) +
  geom_point() +
  theme_bw() +
  labs(x = "Mean Emissions (Log Transformed)", y = "", 
       title = "Scatter Plot of Mean Child Mortality vs Mean Emissions by Country") +
  geom_smooth(method = "lm", color = "blue") +
  scale_x_log10()

Code
yearly_means <- final_data %>%
  group_by(year) %>%
  summarise(
    mean_child_mortality = mean(child_mortality, na.rm = TRUE),
    mean_emissions = mean(emissions, na.rm = TRUE)
  )


yearly_means %>%
  mutate(period = 25 * floor((year - 1800) / 25) + 1800) %>%
  ggplot(aes(x = mean_emissions, y = mean_child_mortality, color = as.factor(period))) +
  geom_point() +
  theme_bw() +
  labs(x = "Mean Emissions (Log Transformed)", y = "Mean Child Mortality",
       title = "Scatter Plot of Mean Child Mortality vs Mean Emissions by Period",
       color = "Period") +
  geom_smooth(method = "lm", color = "blue") +
  scale_x_log10() +
  scale_color_viridis_d()

Code
# Un-logged
yearly_means %>%
  mutate(period = 25 * floor((year - 1800) / 25) + 1800) %>%
  ggplot(aes(x = mean_emissions, y = mean_child_mortality, color = as.factor(period))) +
  geom_point() +
  theme_bw() +
  labs(x = "Mean Emissions", y = "Mean Child Mortality",
       title = "Scatter Plot of Mean Child Mortality vs Mean Emissions by Period",
       color = "Period") +
  geom_smooth(method = "lm", color = "blue") +
  scale_color_viridis_d()

To extract the linear equation from the above plot, a linear model was created with mean_emissions as the explanatory variable, and mean_child_mortality as the response variable. The resulting table shows the linear analysis.

The regression equation for the model is \(\hat{y}=403.29298−74.17116⋅x\), where \(\hat{y}\) is equal to the estimated mean_child_mortality, and \(x\) is equal to the mean_emissions.

Code
final_data_lm <- lm(mean_child_mortality ~ mean_emissions,
                    data = yearly_means)

tidy(final_data_lm)
# A tibble: 2 × 5
  term           estimate std.error statistic   p.value
  <chr>             <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)       403.       2.88     140.  8.87e-218
2 mean_emissions    -74.2      1.17     -63.4 1.02e-143

Test

Code
yearly_means_log <- yearly_means %>%
  mutate(log_mean_emissions = log(mean_emissions))

final_data_lm_log <- lm(mean_child_mortality ~ log_mean_emissions, data = yearly_means_log)
Code
tidy(final_data_lm_log)
# A tibble: 2 × 5
  term               estimate std.error statistic   p.value
  <chr>                 <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)           240.       3.58      67.1 6.43e-149
2 log_mean_emissions    -85.4      2.17     -39.4 5.44e-102
Code
r_yearly_log <- yearly_means_log %>%
  summarize(pearsons_r = cor(mean_child_mortality,
                             log_mean_emissions, method = "pearson")) %>%
  pull(pearsons_r)
Code
final_data_lm_log %>%
  augment() %>%
  ggplot(aes(x = .resid)) +
    geom_histogram(aes(y = ..density..), binwidth = 1, colour = "black", fill = "white") +
    geom_density(alpha = .2, fill = "#FF6666") +
    labs(x = "Residuals", y = "Density",
         title = "Histogram of Residuals with Normal Density Curve") +
    theme_bw()

Code
final_data_lm_log %>%
  augment() %>%
  summarise(
    response_variance = var(mean_child_mortality, na.rm = TRUE),
    fitted_variance = var(.fitted, na.rm = TRUE),
    residuals_variance = var(.resid, na.rm = TRUE)
  ) %>%
  pivot_longer(
    everything(),
    names_to = "Metric",
    values_to = "Variance"
  ) %>%
  mutate(
    Metric = recode(Metric,
      'response_variance' = "Variance of Response",
      'fitted_variance' = "Variance of Fitted Values",
      'residuals_variance' = "Variance of Residuals"
    )
  )
# A tibble: 3 × 2
  Metric                    Variance
  <chr>                        <dbl>
1 Variance of Response        20559.
2 Variance of Fitted Values   18002.
3 Variance of Residuals        2558.
Code
fitted_values_log <- predict(final_data_lm_log)
st_error_log <- sigma(final_data_lm_log)

simulated_data_log <- data.frame(predicted = fitted_values_log) %>%
  mutate(simulated_child_mortality = exp(predicted + rnorm(n(), mean = 0, sd = st_error_log)))

ggplot(simulated_data_log, aes(x = predicted, y = simulated_child_mortality)) +
  geom_point()

End of Test


b. Assessing appropriateness of fit

In order to conclude with confidence that appropriateness of using a linear regression for this relationship, the four model conditions of linearity, independence, normality of distribution, and equality of variance must be assessed. Each will be discussed in turn:

Code
r_yearly <- yearly_means %>%
  summarize(pearsons_r = cor(mean_emissions,
                             mean_child_mortality, method = "pearson"))



glance(final_data_lm)
# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic   p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>     <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.948         0.948  32.8     4016. 1.02e-143     1 -1094. 2194. 2204.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

1. Linearity:
By looking at the above scatterplot from section III.a, a clear linear pattern is evident. The Pearson’s correlation coefficient (r) is -0.9735707. Given these observations, the first criteria failed to be violated.

2. Independence:
The data were intentionally aggregated to calculate global rates of child mortality and emissions per capita, averaging the values without regard to their country of origin. However, considering that each year’s data likely has a significant causal influence on the data of the following year, this model may be at risk of violating the independence assumption. The temporal nature of the data suggests that observations across years are not entirely independent, which could impact the validity of the independence assumption in our analysis.

3. Normality of residuals:
The histogram below displays the distribution of residuals from the model. The distribution is approximately centered around zero, suggesting that the model does not have systematic bias. The overall shape of the distribution approximates a bell curve, but with a pronounced peak (leptokurtosis) and a moderate right skew. Nonetheless, this does not necessarily violate the assumption of normality, especially considering the large sample size.

Code
final_data_lm |> 
  augment() |> 
ggplot(aes(x = .fitted, y = .resid)) +
  geom_point()

Code
# Histogram

final_data_lm %>%
  augment() %>%
  ggplot(aes(x = .resid)) +
    geom_histogram(aes(y = ..density..), binwidth = 1,
                   colour = "black", fill = "white") +
    geom_density(alpha = .2, fill = "#FF6666") +
    labs(x = "Residuals", y = "",
         title = "Histogram of Residuals with Normal Density Curve") +
    theme_bw()

4. Equal variance of residuals:
The scatterplot below illustrates the distribution of residuals against predicted values, providing insight into the homoscedasticity of the model. It appears that there is an increase in the spread of residuals as the predicted values rise. This pattern suggests a potential violation of the equal variance assumption, which could compromise the reliability of the linear regression analysis.

Code
final_data_lm %>%
  augment() %>%
  summarise(
    response_variance = var(mean_child_mortality, na.rm = TRUE),
    fitted_variance = var(.fitted, na.rm = TRUE),
    residuals_variance = var(.resid, na.rm = TRUE)
  ) %>%
  pivot_longer(
    everything(),
    names_to = "Metric",
    values_to = "Variance"
  ) %>%
  mutate(
    Metric = case_when(
      Metric == "response_variance" ~ "Variance of Response",
      Metric == "fitted_variance" ~ "Variance of Fitted Values",
      Metric == "residuals_variance" ~ "Variance of Residuals",
      TRUE ~ Metric # Fallback, should not be needed
    )
  )
# A tibble: 3 × 2
  Metric                    Variance
  <chr>                        <dbl>
1 Variance of Response        20559.
2 Variance of Fitted Values   19487.
3 Variance of Residuals        1072.
Code
predictions <- predict(final_data_lm)
st_error <- sigma(final_data_lm)

predictions %>%
  data.frame(predicted = .) %>%
  mutate(residuals = rnorm(n(), mean = 0, sd = st_error)) %>%
  ggplot(aes(x = predicted, y = residuals)) +
    geom_point() +
  theme_bw() +
  labs(x = "Predicted values", y = "",
       title = "Residuals vs. Predicted Values")

To be incorporated:
The Adjusted R-squared was 0.9476 for the data indicating that close to 95% of the variability in mean child mortality can be explained by mean CO2 emissions. This indicates that the model is a very good fit for the data. The only issue is the plot of the residuals, since the residuals do not have a random distribution and seem to follow a pattern. This could indicate that the relationship between CO2 emissions and child mortality is non-linear and could lead to issues with other tests on the data.

c. Simulation

Code
noise <- function(x, mean = 0, sd) {
  return(x + rnorm(length(x),
            mean,
            sd)
         )
}

simulations <- map_dfc(.x = 1:1000,
                       .f = ~tibble(simulation = noise(predictions,
                                                       sd = st_error)
                                    )
                       )
Code
colnames(simulations) <- colnames(simulations) |> 
  str_replace(pattern = "\\.\\.\\.",
                  replace = "_")

binded_simulations <- yearly_means |>
  filter(!is.na(mean_child_mortality),
         !is.na(mean_emissions)
         ) |>
  select(mean_child_mortality) |>
  bind_cols(simulations)

head(binded_simulations)
Code
simulated_r_sq <- binded_simulations |>
  map(~ lm(mean_child_mortality ~ .x, data = binded_simulations)
      ) |>
  map(glance) |>
  map_dbl(~ .x$r.squared)

head(simulated_r_sq)
mean_child_mortality         simulation_1         simulation_2 
           1.0000000            0.8907460            0.9015945 
        simulation_3         simulation_4         simulation_5 
           0.8925711            0.8958496            0.8992299 
Code
tibble(simulations = simulated_r_sq) |> 
  ggplot(aes(x = simulations)) + 
  geom_histogram(binwidth = 0.001, color = "black", fill = "skyblue") +
  labs(x = expression("Simulated"~ R^2),
       y = "",
       subtitle = "Number of Simulated Models") +
  theme_bw() +
  scale_y_continuous(breaks = seq(0,
                                  60,
                                  by = 2)
                     )

Conclusion

[text here]

References

Ranganathan et al., 2015 https://www.nature.com/articles/palcomms201533


Fixed components

Code
library(knitr)
library(kableExtra)

# Averaging across all years for each country

means_data <- final_data %>%
  group_by(country) %>%
  summarise(
    mean_child_mortality = mean(child_mortality, na.rm = TRUE),
    mean_emissions = mean(emissions, na.rm = TRUE)  
  ) %>%
  mutate(
    log_mean_emissions = log(mean_emissions)
  ) %>%
  ungroup()  



means_data_kable <- kable(means_data, "html", escape = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = F, position = "left") %>%
  scroll_box(width = "100%", height = "500px")


means_data_kable
country mean_child_mortality mean_emissions log_mean_emissions
Afghanistan 386.25740 0.0466368 -3.0653660
Albania 273.38637 0.4961345 -0.7009082
Algeria 337.47623 0.7639058 -0.2693108
Andorra 19.21055 2.7854933 1.0244250
Angola 389.17399 0.2111839 -1.5550262
Antigua and Barbuda 198.28354 1.5734978 0.4533010
Argentina 236.32422 1.4339731 0.3604490
Armenia 229.97534 0.7998879 -0.2232837
Australia 164.54354 5.2926009 1.6663098
Austria 225.58197 4.2965874 1.4578211
Azerbaijan 240.90448 1.7380404 0.5527582
Bahamas 182.38238 3.5361121 1.2630278
Bahrain 311.77955 4.4071166 1.4832206
Bangladesh 387.63767 0.0727758 -2.6203720
Barbados 365.35740 0.9733587 -0.0270026
Belarus 223.34018 2.0042735 0.6952817
Belgium 166.45704 8.4054709 2.1288828
Belize 215.76906 0.4507937 -0.7967454
Benin 349.71256 0.1069013 -2.2358489
Bhutan 349.52960 0.1370269 -1.9875780
Bolivia 316.47354 0.3524843 -1.0427492
Bosnia and Herzegovina 259.09130 1.2985919 0.2612805
Botswana 291.26413 0.5983901 -0.5135123
Brazil 297.82691 0.5622511 -0.5758067
Brunei 289.50368 6.3411614 1.8470619
Bulgaria 221.15511 1.8445516 0.6122362
Burkina Faso 374.30762 0.0302556 -3.4980738
Burundi 338.23632 0.0125561 -4.3775524
Cambodia 301.53453 0.1707220 -1.7677189
Cameroon 361.40762 0.0793677 -2.5336636
Canada 180.01933 7.2390493 1.9794899
Cape Verde 292.17982 0.3029641 -1.1941409
Central African Republic 356.56054 0.0342915 -3.3728584
Chad 356.25112 0.0196637 -3.9289821
Chile 299.68090 1.1648655 0.1526056
China 305.78283 0.8371480 -0.1777544
Colombia 274.75874 0.5669058 -0.5675621
Comoros 329.61973 0.0583049 -2.8420686
Congo, Dem. Rep. 341.65291 0.0352735 -3.3446221
Congo, Rep. 306.12960 0.1779821 -1.7260725
Costa Rica 278.65794 0.5615336 -0.5770836
Cote d'Ivoire 346.98834 0.1607130 -1.8281351
Croatia 238.03789 1.3277578 0.2834917
Cuba 239.56677 0.7723139 -0.2583642
Cyprus 209.90538 1.9542960 0.6700300
Czech Republic 220.15224 5.2377982 1.6559012
Denmark 147.32538 4.5645919 1.5183291
Djibouti 342.72825 0.2611031 -1.3428398
Dominica 53.66575 0.3456143 -1.0624317
Dominican Republic 312.21973 0.5868655 -0.5329597
Ecuador 292.73498 0.5138206 -0.6658810
Egypt 317.43543 0.4625695 -0.7709584
El Salvador 326.72377 0.2956413 -1.2186085
Equatorial Guinea 359.86816 0.7619910 -0.2718205
Eritrea 345.57489 0.0729238 -2.6183407
Estonia 205.93753 4.4824439 1.5001684
Eswatini 313.37937 0.2373857 -1.4380692
Ethiopia 357.23049 0.0209955 -3.8634464
Fiji 329.91076 0.4592780 -0.7780995
Finland 186.78229 3.8995740 1.3608673
France 162.37700 4.0272780 1.3930907
Gabon 325.99283 1.2170448 0.1964257
Gambia 372.29821 0.0687848 -2.6767732
Georgia 233.11040 0.8167220 -0.2024565
Germany 229.78448 6.3502646 1.8484965
Ghana 349.55471 0.1608565 -1.8272426
Greece 208.14655 1.8874978 0.6352520
Grenada 206.47848 0.3923049 -0.9357159
Guatemala 363.13229 0.2333453 -1.4552360
Guinea 369.68475 0.0740448 -2.6030844
Guinea-Bissau 348.37444 0.0464484 -3.0694126
Guyana 271.49417 0.7439776 -0.2957444
Haiti 375.11749 0.0527175 -2.9428080
Honduras 288.90807 0.2420179 -1.4187434
Hong Kong, China 21.71986 2.9010045 1.0650571
Hungary 240.81704 2.6974664 0.9923130
Iceland 180.74794 2.8275247 1.0394017
India 351.83229 0.2609552 -1.3434067
Indonesia 327.94081 0.3532466 -1.0405888
Iran 377.38341 1.1978430 0.1805225
Iraq 313.98161 1.1182063 0.1117259
Ireland 152.76081 3.4560807 1.2401352
Israel 253.29700 2.6546502 0.9763129
Italy 243.21148 2.5398744 0.9321146
Jamaica 230.29552 0.8638565 -0.1463486
Japan 219.69170 3.0719865 1.1223244
Jordan 298.61031 0.9714664 -0.0289486
Kazakhstan 317.96915 4.3394350 1.4677442
Kenya 373.66502 0.1628700 -1.8148032
Kiribati 272.47489 0.1575874 -1.8477748
Kuwait 342.07493 9.7526906 2.2775432
Kyrgyz Republic 263.57175 1.1759193 0.1620502
Lao 329.20762 0.1677175 -1.7854743
Latvia 217.02982 1.9157354 0.6501016
Lebanon 299.93682 0.8933408 -0.1127871
Lesotho 309.94036 0.2969283 -1.2142647
Liberia 353.10852 0.1338700 -2.0108864
Libya 302.45426 2.2372018 0.8052259
Liechtenstein 21.24575 2.5136143 0.9217217
Lithuania 251.93484 2.4752242 0.9063310
Luxembourg 202.32336 9.7173722 2.2739152
Madagascar 338.22915 0.0517175 -2.9619593
Malawi 366.96682 0.0703812 -2.6538296
Malaysia 291.41170 1.2340269 0.2102827
Maldives 324.39493 0.3335291 -1.0980250
Mali 412.50090 0.0243901 -3.7135766
Malta 224.87215 2.7239507 1.0020833
Marshall Islands 67.14795 0.5554978 -0.5878907
Mauritania 326.60269 0.1430628 -1.9444717
Mauritius 242.85650 0.8357713 -0.1794003
Mexico 326.68655 1.3734305 0.3173116
Micronesia, Fed. Sts. 215.15202 0.3242287 -1.1263061
Moldova 231.82287 1.2841704 0.2501129
Mongolia 318.70000 1.4798117 0.3919148
Montenegro 246.16987 0.8760897 -0.1322868
Morocco 303.78565 0.3517982 -1.0446975
Mozambique 357.83274 0.1464305 -1.9212044
Myanmar 333.33946 0.1638206 -1.8089832
Namibia 293.19731 0.4525381 -0.7928833
Nauru 69.70000 3.4392242 1.2352459
Nepal 325.18072 0.0465112 -3.0680619
Netherlands 164.92390 4.9606682 1.6015404
New Zealand 144.92924 3.2775067 1.1870830
Nicaragua 357.10045 0.2547892 -1.3673186
Niger 370.58296 0.0228610 -3.7783235
Nigeria 363.45291 0.1572511 -1.8499113
North Korea 332.15157 1.2094664 0.1901792
North Macedonia 258.76874 1.7188565 0.5416592
Norway 122.41036 3.1672063 1.1528499
Oman 322.01018 2.2998296 0.8328350
Pakistan 382.93812 0.1901839 -1.6597640
Palau 37.85342 3.6630762 1.2983033
Palestine 307.80807 0.1181749 -2.1355897
Panama 271.92108 0.4641250 -0.7676014
Papua New Guinea 315.19013 0.1545830 -1.8670244
Paraguay 254.11031 0.2817265 -1.2668187
Peru 282.35157 0.5690000 -0.5638748
Philippines 298.00135 0.3092466 -1.1736161
Poland 231.20291 3.6753722 1.3016544
Portugal 235.22480 1.6005650 0.4703567
Qatar 297.75655 7.3184753 1.9904020
Romania 249.18300 1.7500807 0.5596619
Russia 269.54960 3.4704484 1.2442838
Rwanda 329.05605 0.0239462 -3.7319461
Samoa 218.51166 0.2134529 -1.5443390
Sao Tome and Principe 254.96816 0.1302287 -2.0384631
Saudi Arabia 307.46054 3.9673274 1.3780927
Senegal 395.24439 0.1673049 -1.7879372
Serbia 217.43511 1.5843004 0.4601430
Seychelles 181.79686 0.9933677 -0.0066544
Sierra Leone 429.65291 0.0647040 -2.7379317
Singapore 263.00704 6.4327489 1.8614020
Slovak Republic 225.74915 4.5391211 1.5127334
Slovenia 222.98291 2.5129238 0.9214469
Solomon Islands 357.53587 0.1097085 -2.2099282
Somalia 362.48879 0.0270852 -3.6087678
South Africa 293.02466 2.5586906 0.9394956
South Korea 334.21139 2.1194753 0.7511686
South Sudan 398.13184 0.0370448 -3.2956261
Spain 231.76623 1.9819283 0.6840702
Sri Lanka 272.26969 0.2886413 -1.2425707
St. Kitts and Nevis 59.94247 0.8058386 -0.2158718
St. Lucia 220.13901 0.4149238 -0.8796605
St. Vincent and the Grenadines 230.02691 0.3193408 -1.1414964
Sudan 319.03094 0.1166996 -2.1481526
Suriname 273.07309 1.4609686 0.3790996
Sweden 128.63444 3.7264305 1.3154508
Switzerland 166.26103 4.1204439 1.4159609
Syria 303.85067 0.6301390 -0.4618148
Taiwan 296.09933 2.4094574 0.8794016
Tajikistan 255.55426 0.5052063 -0.6827885
Tanzania 325.12556 0.0730628 -2.6164362
Thailand 302.30291 0.5908296 -0.5262276
Timor-Leste 363.87668 0.0569910 -2.8648614
Togo 336.44709 0.0909552 -2.3973887
Tonga 190.89417 0.2493049 -1.3890785
Trinidad and Tobago 223.89283 4.8734843 1.5838091
Tunisia 338.26951 0.4442825 -0.8112946
Turkey 290.41529 0.9963004 -0.0037064
Turkmenistan 330.32287 4.0312960 1.3940879
Tuvalu 68.19726 0.2638565 -1.3323499
UAE 309.10951 5.5005561 1.7048492
UK 153.93148 8.0481614 2.0854437
USA 177.84022 10.0326951 2.3058493
Uganda 386.88161 0.0422108 -3.1650801
Ukraine 239.93363 2.4806592 0.9085243
Uruguay 242.11081 0.7128924 -0.3384248
Uzbekistan 269.65426 1.9439327 0.6647131
Vanuatu 218.91480 0.2458475 -1.4030437
Venezuela 269.91480 1.9177803 0.6511684
Vietnam 287.26144 0.2582825 -1.3537013
Yemen 422.88161 0.2103677 -1.5588983
Zambia 319.54350 0.5254260 -0.6435459
Zimbabwe 292.04709 0.6607399 -0.4143950
Code
# Over time scatterplot

final_data <- final_data %>%
  mutate(Decade = cut(year, breaks = seq(1953, 2023, 10), 
                      labels = c("1953-1962", "1963-1972",
                                 "1973-1982", "1983-1992",
                                 "1993-2002", "2003-2012",
                                 "2013-2022"),
                      include.lowest = TRUE))


final_data_means <- final_data %>%
  filter(!is.na(Decade)) %>%
  group_by(country, Decade) %>%
  summarize(
    mean_emissions = mean(emissions, na.rm = TRUE),
    mean_child_mortality = mean(child_mortality, na.rm = TRUE),
    .groups = 'drop'
  )


ggplot(final_data_means, aes(x = mean_emissions,
                             y = mean_child_mortality)) +
  geom_point() +
  facet_wrap(Decade ~ ., scales = "free") + 
  labs(title = "Mean Child Mortality vs. Mean Emissions Over Decades (1953 - 2022)",
       x = "Mean Emissions Per Capita (Log Transformed)",
       y = "") +
  scale_x_log10() +
  theme_bw()

Code
# Snapshot Scatterplot

means_data %>%
  ggplot(aes(x = log_mean_emissions, y = mean_child_mortality)) +
  geom_point() +
  theme_bw() +
  labs(x = "Log of Mean Emissions", y = "", 
       title = "Scatter Plot of Mean Child Mortality vs Log of Mean Emissions by Country") +
  geom_smooth(method = "lm", color = "blue")

Code
# New linear model

means_data_lm <- lm(mean_child_mortality ~ log_mean_emissions,
                    data = means_data)

tidy(means_data_lm)
# A tibble: 2 × 5
  term               estimate std.error statistic   p.value
  <chr>                 <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)           260.       4.93      52.7 4.32e-116
2 log_mean_emissions    -31.3      2.98     -10.5 1.07e- 20
Code
# New scatterplot and histogram for residuals

means_data_lm %>%  
  augment() %>%  
ggplot(aes(x = .fitted, y = .resid)) +
  geom_point()

Code
means_data_lm %>%
  augment() %>%
  ggplot(aes(x = .resid)) +
    geom_histogram(aes(y = ..density..), colour = "black", fill = "white") + # Removed binwidth
    geom_density(alpha = .2, fill = "#FF6666") +
    labs(x = "Residuals", y = "Density",
         title = "Histogram of Residuals with Normal Density Curve") +
    theme_bw()

Code
# New variances table

means_data_lm %>%
  augment() %>%
  summarise(
    response_variance = var(mean_child_mortality, na.rm = TRUE),
    fitted_variance = var(.fitted, na.rm = TRUE),
    residuals_variance = var(.resid, na.rm = TRUE)
  ) %>%
  pivot_longer(
    everything(),
    names_to = "Metric",
    values_to = "Variance"
  ) %>%
  mutate(
    Metric = case_when(
      Metric == "response_variance" ~ "Variance of Response",
      Metric == "fitted_variance" ~ "Variance of Fitted Values",
      Metric == "residuals_variance" ~ "Variance of Residuals",
      TRUE ~ Metric # Fallback, should not be needed
    )
  )
# A tibble: 3 × 2
  Metric                    Variance
  <chr>                        <dbl>
1 Variance of Response         6730.
2 Variance of Fitted Values    2458.
3 Variance of Residuals        4272.
Code
predictions_means <- predict(means_data_lm)
st_error_means <- sigma(means_data_lm)

predictions_means %>%
  data.frame(predicted = .) %>%
  mutate(residuals = rnorm(n(), mean = 0, sd = st_error_means)) %>%
  ggplot(aes(x = predicted, y = residuals)) +
    geom_point() +
  theme_bw() +
  labs(x = "Predicted values", y = "",
       title = "Residuals vs. Predicted Values")

Start of new simulation

Code
predictions_means <- predict(means_data_lm)

# Generate simulated residuals
simulated_residuals_means <- rnorm(length(predictions_means), mean = 0, sd = sigma(means_data_lm))

# Create a dataframe with predicted values and simulated residuals
simulated_data_means <- data.frame(predicted = predictions_means, residuals = simulated_residuals_means)

# Plot simulated residuals vs predicted values to assess variance
ggplot(simulated_data_means, aes(x = predicted, y = residuals)) +
  geom_point() +
  theme_bw() +
  labs(x = "Predicted Values", y = "Simulated Response Values",
       title = "Simulated Response Values vs. Predicted Values")

Code
noise <- function(x, mean = 0, sd) {
  return(x + rnorm(length(x), mean, sd))
}

# Generate 1000 simulations
simulations_means <- map_dfc(.x = 1:1000,
                       .f = ~tibble(simulation = noise(predictions_means,
                                                       sd = st_error_means)
                                    )
                       )
Code
colnames(simulations_means) <- str_replace(colnames(simulations_means), pattern = "\\.{3}", replace = "_")

binded_simulations_means <- means_data %>%
  filter(!is.na(mean_child_mortality), !is.na(log_mean_emissions)) %>%
  select(country, mean_child_mortality, log_mean_emissions) %>%
  bind_cols(simulations_means)


head(binded_simulations_means)
# A tibble: 6 × 1,003
  country      mean_child_mortality log_mean_emissions simulation_1 simulation_2
  <chr>                       <dbl>              <dbl>        <dbl>        <dbl>
1 Afghanistan                 386.              -3.07          346.        406. 
2 Albania                     273.              -0.701         341.        295. 
3 Algeria                     337.              -0.269         169.        297. 
4 Andorra                      19.2              1.02          305.        293. 
5 Angola                      389.              -1.56          407.        206. 
6 Antigua and…                198.               0.453         260.         77.7
# ℹ 998 more variables: simulation_3 <dbl>, simulation_4 <dbl>,
#   simulation_5 <dbl>, simulation_6 <dbl>, simulation_7 <dbl>,
#   simulation_8 <dbl>, simulation_9 <dbl>, simulation_10 <dbl>,
#   simulation_11 <dbl>, simulation_12 <dbl>, simulation_13 <dbl>,
#   simulation_14 <dbl>, simulation_15 <dbl>, simulation_16 <dbl>,
#   simulation_17 <dbl>, simulation_18 <dbl>, simulation_19 <dbl>,
#   simulation_20 <dbl>, simulation_21 <dbl>, simulation_22 <dbl>, …
Code
simulation_cols_means <- names(binded_simulations_means)[-(1:3)]

simulated_r_sq_means <- map_dbl(simulation_cols_means, ~ {
  formula <- as.formula(paste("mean_child_mortality ~", .x))
  model <- lm(formula, data = binded_simulations_means)
  summary(model)$r.squared
})

tibble(simulations = simulated_r_sq_means) %>%
  ggplot(aes(x = simulations)) + 
    geom_histogram(binwidth = 0.001, color = "black", fill = "skyblue") +
    labs(x = expression("Simulated"~ R^2),
         y = "Frequency",
         subtitle = "Histogram of Simulated R-squared Values") +
    theme_bw() +
    scale_y_continuous()

Everything in one chunk for GPT

Code
library(knitr)
library(kableExtra)

# Averaging across all years for each country

means_data <- final_data %>%
  group_by(country) %>%
  summarise(
    mean_child_mortality = mean(child_mortality, na.rm = TRUE),
    mean_emissions = mean(emissions, na.rm = TRUE)  
  ) %>%
  mutate(
    log_mean_emissions = log(mean_emissions)
  ) %>%
  ungroup()  


means_data_kable <- kable(means_data, "html", escape = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = F, position = "left") %>%
  scroll_box(width = "100%", height = "500px")


means_data_kable
country mean_child_mortality mean_emissions log_mean_emissions
Afghanistan 386.25740 0.0466368 -3.0653660
Albania 273.38637 0.4961345 -0.7009082
Algeria 337.47623 0.7639058 -0.2693108
Andorra 19.21055 2.7854933 1.0244250
Angola 389.17399 0.2111839 -1.5550262
Antigua and Barbuda 198.28354 1.5734978 0.4533010
Argentina 236.32422 1.4339731 0.3604490
Armenia 229.97534 0.7998879 -0.2232837
Australia 164.54354 5.2926009 1.6663098
Austria 225.58197 4.2965874 1.4578211
Azerbaijan 240.90448 1.7380404 0.5527582
Bahamas 182.38238 3.5361121 1.2630278
Bahrain 311.77955 4.4071166 1.4832206
Bangladesh 387.63767 0.0727758 -2.6203720
Barbados 365.35740 0.9733587 -0.0270026
Belarus 223.34018 2.0042735 0.6952817
Belgium 166.45704 8.4054709 2.1288828
Belize 215.76906 0.4507937 -0.7967454
Benin 349.71256 0.1069013 -2.2358489
Bhutan 349.52960 0.1370269 -1.9875780
Bolivia 316.47354 0.3524843 -1.0427492
Bosnia and Herzegovina 259.09130 1.2985919 0.2612805
Botswana 291.26413 0.5983901 -0.5135123
Brazil 297.82691 0.5622511 -0.5758067
Brunei 289.50368 6.3411614 1.8470619
Bulgaria 221.15511 1.8445516 0.6122362
Burkina Faso 374.30762 0.0302556 -3.4980738
Burundi 338.23632 0.0125561 -4.3775524
Cambodia 301.53453 0.1707220 -1.7677189
Cameroon 361.40762 0.0793677 -2.5336636
Canada 180.01933 7.2390493 1.9794899
Cape Verde 292.17982 0.3029641 -1.1941409
Central African Republic 356.56054 0.0342915 -3.3728584
Chad 356.25112 0.0196637 -3.9289821
Chile 299.68090 1.1648655 0.1526056
China 305.78283 0.8371480 -0.1777544
Colombia 274.75874 0.5669058 -0.5675621
Comoros 329.61973 0.0583049 -2.8420686
Congo, Dem. Rep. 341.65291 0.0352735 -3.3446221
Congo, Rep. 306.12960 0.1779821 -1.7260725
Costa Rica 278.65794 0.5615336 -0.5770836
Cote d'Ivoire 346.98834 0.1607130 -1.8281351
Croatia 238.03789 1.3277578 0.2834917
Cuba 239.56677 0.7723139 -0.2583642
Cyprus 209.90538 1.9542960 0.6700300
Czech Republic 220.15224 5.2377982 1.6559012
Denmark 147.32538 4.5645919 1.5183291
Djibouti 342.72825 0.2611031 -1.3428398
Dominica 53.66575 0.3456143 -1.0624317
Dominican Republic 312.21973 0.5868655 -0.5329597
Ecuador 292.73498 0.5138206 -0.6658810
Egypt 317.43543 0.4625695 -0.7709584
El Salvador 326.72377 0.2956413 -1.2186085
Equatorial Guinea 359.86816 0.7619910 -0.2718205
Eritrea 345.57489 0.0729238 -2.6183407
Estonia 205.93753 4.4824439 1.5001684
Eswatini 313.37937 0.2373857 -1.4380692
Ethiopia 357.23049 0.0209955 -3.8634464
Fiji 329.91076 0.4592780 -0.7780995
Finland 186.78229 3.8995740 1.3608673
France 162.37700 4.0272780 1.3930907
Gabon 325.99283 1.2170448 0.1964257
Gambia 372.29821 0.0687848 -2.6767732
Georgia 233.11040 0.8167220 -0.2024565
Germany 229.78448 6.3502646 1.8484965
Ghana 349.55471 0.1608565 -1.8272426
Greece 208.14655 1.8874978 0.6352520
Grenada 206.47848 0.3923049 -0.9357159
Guatemala 363.13229 0.2333453 -1.4552360
Guinea 369.68475 0.0740448 -2.6030844
Guinea-Bissau 348.37444 0.0464484 -3.0694126
Guyana 271.49417 0.7439776 -0.2957444
Haiti 375.11749 0.0527175 -2.9428080
Honduras 288.90807 0.2420179 -1.4187434
Hong Kong, China 21.71986 2.9010045 1.0650571
Hungary 240.81704 2.6974664 0.9923130
Iceland 180.74794 2.8275247 1.0394017
India 351.83229 0.2609552 -1.3434067
Indonesia 327.94081 0.3532466 -1.0405888
Iran 377.38341 1.1978430 0.1805225
Iraq 313.98161 1.1182063 0.1117259
Ireland 152.76081 3.4560807 1.2401352
Israel 253.29700 2.6546502 0.9763129
Italy 243.21148 2.5398744 0.9321146
Jamaica 230.29552 0.8638565 -0.1463486
Japan 219.69170 3.0719865 1.1223244
Jordan 298.61031 0.9714664 -0.0289486
Kazakhstan 317.96915 4.3394350 1.4677442
Kenya 373.66502 0.1628700 -1.8148032
Kiribati 272.47489 0.1575874 -1.8477748
Kuwait 342.07493 9.7526906 2.2775432
Kyrgyz Republic 263.57175 1.1759193 0.1620502
Lao 329.20762 0.1677175 -1.7854743
Latvia 217.02982 1.9157354 0.6501016
Lebanon 299.93682 0.8933408 -0.1127871
Lesotho 309.94036 0.2969283 -1.2142647
Liberia 353.10852 0.1338700 -2.0108864
Libya 302.45426 2.2372018 0.8052259
Liechtenstein 21.24575 2.5136143 0.9217217
Lithuania 251.93484 2.4752242 0.9063310
Luxembourg 202.32336 9.7173722 2.2739152
Madagascar 338.22915 0.0517175 -2.9619593
Malawi 366.96682 0.0703812 -2.6538296
Malaysia 291.41170 1.2340269 0.2102827
Maldives 324.39493 0.3335291 -1.0980250
Mali 412.50090 0.0243901 -3.7135766
Malta 224.87215 2.7239507 1.0020833
Marshall Islands 67.14795 0.5554978 -0.5878907
Mauritania 326.60269 0.1430628 -1.9444717
Mauritius 242.85650 0.8357713 -0.1794003
Mexico 326.68655 1.3734305 0.3173116
Micronesia, Fed. Sts. 215.15202 0.3242287 -1.1263061
Moldova 231.82287 1.2841704 0.2501129
Mongolia 318.70000 1.4798117 0.3919148
Montenegro 246.16987 0.8760897 -0.1322868
Morocco 303.78565 0.3517982 -1.0446975
Mozambique 357.83274 0.1464305 -1.9212044
Myanmar 333.33946 0.1638206 -1.8089832
Namibia 293.19731 0.4525381 -0.7928833
Nauru 69.70000 3.4392242 1.2352459
Nepal 325.18072 0.0465112 -3.0680619
Netherlands 164.92390 4.9606682 1.6015404
New Zealand 144.92924 3.2775067 1.1870830
Nicaragua 357.10045 0.2547892 -1.3673186
Niger 370.58296 0.0228610 -3.7783235
Nigeria 363.45291 0.1572511 -1.8499113
North Korea 332.15157 1.2094664 0.1901792
North Macedonia 258.76874 1.7188565 0.5416592
Norway 122.41036 3.1672063 1.1528499
Oman 322.01018 2.2998296 0.8328350
Pakistan 382.93812 0.1901839 -1.6597640
Palau 37.85342 3.6630762 1.2983033
Palestine 307.80807 0.1181749 -2.1355897
Panama 271.92108 0.4641250 -0.7676014
Papua New Guinea 315.19013 0.1545830 -1.8670244
Paraguay 254.11031 0.2817265 -1.2668187
Peru 282.35157 0.5690000 -0.5638748
Philippines 298.00135 0.3092466 -1.1736161
Poland 231.20291 3.6753722 1.3016544
Portugal 235.22480 1.6005650 0.4703567
Qatar 297.75655 7.3184753 1.9904020
Romania 249.18300 1.7500807 0.5596619
Russia 269.54960 3.4704484 1.2442838
Rwanda 329.05605 0.0239462 -3.7319461
Samoa 218.51166 0.2134529 -1.5443390
Sao Tome and Principe 254.96816 0.1302287 -2.0384631
Saudi Arabia 307.46054 3.9673274 1.3780927
Senegal 395.24439 0.1673049 -1.7879372
Serbia 217.43511 1.5843004 0.4601430
Seychelles 181.79686 0.9933677 -0.0066544
Sierra Leone 429.65291 0.0647040 -2.7379317
Singapore 263.00704 6.4327489 1.8614020
Slovak Republic 225.74915 4.5391211 1.5127334
Slovenia 222.98291 2.5129238 0.9214469
Solomon Islands 357.53587 0.1097085 -2.2099282
Somalia 362.48879 0.0270852 -3.6087678
South Africa 293.02466 2.5586906 0.9394956
South Korea 334.21139 2.1194753 0.7511686
South Sudan 398.13184 0.0370448 -3.2956261
Spain 231.76623 1.9819283 0.6840702
Sri Lanka 272.26969 0.2886413 -1.2425707
St. Kitts and Nevis 59.94247 0.8058386 -0.2158718
St. Lucia 220.13901 0.4149238 -0.8796605
St. Vincent and the Grenadines 230.02691 0.3193408 -1.1414964
Sudan 319.03094 0.1166996 -2.1481526
Suriname 273.07309 1.4609686 0.3790996
Sweden 128.63444 3.7264305 1.3154508
Switzerland 166.26103 4.1204439 1.4159609
Syria 303.85067 0.6301390 -0.4618148
Taiwan 296.09933 2.4094574 0.8794016
Tajikistan 255.55426 0.5052063 -0.6827885
Tanzania 325.12556 0.0730628 -2.6164362
Thailand 302.30291 0.5908296 -0.5262276
Timor-Leste 363.87668 0.0569910 -2.8648614
Togo 336.44709 0.0909552 -2.3973887
Tonga 190.89417 0.2493049 -1.3890785
Trinidad and Tobago 223.89283 4.8734843 1.5838091
Tunisia 338.26951 0.4442825 -0.8112946
Turkey 290.41529 0.9963004 -0.0037064
Turkmenistan 330.32287 4.0312960 1.3940879
Tuvalu 68.19726 0.2638565 -1.3323499
UAE 309.10951 5.5005561 1.7048492
UK 153.93148 8.0481614 2.0854437
USA 177.84022 10.0326951 2.3058493
Uganda 386.88161 0.0422108 -3.1650801
Ukraine 239.93363 2.4806592 0.9085243
Uruguay 242.11081 0.7128924 -0.3384248
Uzbekistan 269.65426 1.9439327 0.6647131
Vanuatu 218.91480 0.2458475 -1.4030437
Venezuela 269.91480 1.9177803 0.6511684
Vietnam 287.26144 0.2582825 -1.3537013
Yemen 422.88161 0.2103677 -1.5588983
Zambia 319.54350 0.5254260 -0.6435459
Zimbabwe 292.04709 0.6607399 -0.4143950
Code
final_data <- final_data %>%
  mutate(Decade = cut(year, breaks = seq(1953, 2023, 10), 
                      labels = c("1953-1962", "1963-1972",
                                 "1973-1982", "1983-1992",
                                 "1993-2002", "2003-2012",
                                 "2013-2022"),
                      include.lowest = TRUE))


final_data_means <- final_data %>%
  filter(!is.na(Decade)) %>%
  group_by(country, Decade) %>%
  summarize(
    mean_emissions = mean(emissions, na.rm = TRUE),
    mean_child_mortality = mean(child_mortality, na.rm = TRUE),
    .groups = 'drop'
  )


ggplot(final_data_means, aes(x = mean_emissions,
                             y = mean_child_mortality)) +
  geom_point() +
  facet_wrap(Decade ~ ., scales = "free") + 
  labs(title = "Mean Child Mortality vs. Mean Emissions Over Decades (1953 - 2022)",
       x = "Mean Emissions Per Capita (Log Transformed)",
       y = "") +
  scale_x_log10() +
  theme_bw()

Code
# Snapshot Scatterplot

means_data %>%
  ggplot(aes(x = log_mean_emissions, y = mean_child_mortality)) +
  geom_point() +
  theme_bw() +
  labs(x = "Log of Mean Emissions", y = "", 
       title = "Scatter Plot of Mean Child Mortality vs Log of Mean Emissions by Country") +
  geom_smooth(method = "lm", color = "blue")

Code
# New linear model

means_data_lm <- lm(mean_child_mortality ~ log_mean_emissions,
                    data = means_data)

tidy(means_data_lm)
# A tibble: 2 × 5
  term               estimate std.error statistic   p.value
  <chr>                 <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)           260.       4.93      52.7 4.32e-116
2 log_mean_emissions    -31.3      2.98     -10.5 1.07e- 20
Code
# New scatterplot and histogram for residuals

means_data_lm %>%  
  augment() %>%  
ggplot(aes(x = .fitted, y = .resid)) +
  geom_point()

Code
means_data_lm %>%
  augment() %>%
  ggplot(aes(x = .resid)) +
    geom_histogram(aes(y = ..density..), colour = "black", fill = "white") + # Removed binwidth
    geom_density(alpha = .2, fill = "#FF6666") +
    labs(x = "Residuals", y = "Density",
         title = "Histogram of Residuals with Normal Density Curve") +
    theme_bw()

Code
# New variances table

means_data_lm %>%
  augment() %>%
  summarise(
    response_variance = var(mean_child_mortality, na.rm = TRUE),
    fitted_variance = var(.fitted, na.rm = TRUE),
    residuals_variance = var(.resid, na.rm = TRUE)
  ) %>%
  pivot_longer(
    everything(),
    names_to = "Metric",
    values_to = "Variance"
  ) %>%
  mutate(
    Metric = case_when(
      Metric == "response_variance" ~ "Variance of Response",
      Metric == "fitted_variance" ~ "Variance of Fitted Values",
      Metric == "residuals_variance" ~ "Variance of Residuals",
      TRUE ~ Metric # Fallback, should not be needed
    )
  )
# A tibble: 3 × 2
  Metric                    Variance
  <chr>                        <dbl>
1 Variance of Response         6730.
2 Variance of Fitted Values    2458.
3 Variance of Residuals        4272.
Code
predictions_means <- predict(means_data_lm)

# Generate simulated residuals
simulated_residuals_means <- rnorm(length(predictions_means), mean = 0, sd = sigma(means_data_lm))

# Create a dataframe with predicted values and simulated residuals
simulated_data_means <- data.frame(predicted = predictions_means, residuals = simulated_residuals_means)

# Plot simulated residuals vs predicted values to assess variance
ggplot(simulated_data_means, aes(x = predicted, y = residuals)) +
  geom_point() +
  theme_bw() +
  labs(x = "Predicted Values", y = "Simulated Response Values",
       title = "Simulated Response Values vs. Predicted Values")

Code
noise <- function(x, mean = 0, sd) {
  return(x + rnorm(length(x), mean, sd))
}
Code
# Generate 1000 simulations
st_error_means <- sigma(means_data_lm)

# Simulate the response variable
simulated_response_means <- map_dfc(.x = 1:1000, .f = ~ {
  tibble(
    simulated_child_mortality = noise(
      x = predict(means_data_lm, means_data),  # Use the original model predictions as a base
      sd = st_error_means                      # Add noise based on the model's standard error
    )
  )
})

# Now, bind the simulated response with the original predictor
binded_simulations_means <- means_data %>%
  select(country, log_mean_emissions) %>%
  bind_cols(simulated_response_means)

# Iterate over each simulation to calculate R-squared values
new_col_names <- c("country", "log_mean_emissions", paste("simulated_child_mortality", 1:1000, sep = "_"))
names(binded_simulations_means) <- new_col_names

# Now we can run the map_dbl to calculate R-squared values
simulated_r_sq_means <- map_dbl(1:1000, ~ {
  # Extract the simulated response for this iteration
  col_name <- paste("simulated_child_mortality", .x, sep = "_")
  # Fit the linear model using the simulated response
  model <- lm(reformulate("log_mean_emissions", response = col_name), data = binded_simulations_means)
  # Extract R-squared value
  summary(model)$r.squared
})

# Plotting the histogram of R-squared values
tibble(simulations = simulated_r_sq_means) %>%
  ggplot(aes(x = simulations)) + 
    geom_histogram(binwidth = 0.001, color = "black", fill = "skyblue") +
    labs(x = expression("Simulated"~ R^2),
         y = "Frequency",
         subtitle = "Histogram of Simulated R-squared Values") +
    theme_bw() +
    scale_y_continuous()